21 research outputs found
Data-Centric Governance
Artificial intelligence (AI) governance is the body of standards and
practices used to ensure that AI systems are deployed responsibly. Current AI
governance approaches consist mainly of manual review and documentation
processes. While such reviews are necessary for many systems, they are not
sufficient to systematically address all potential harms, as they do not
operationalize governance requirements for system engineering, behavior, and
outcomes in a way that facilitates rigorous and reproducible evaluation. Modern
AI systems are data-centric: they act on data, produce data, and are built
through data engineering. The assurance of governance requirements must also be
carried out in terms of data. This work explores the systematization of
governance requirements via datasets and algorithmic evaluations. When applied
throughout the product lifecycle, data-centric governance decreases time to
deployment, increases solution quality, decreases deployment risks, and places
the system in a continuous state of assured compliance with governance
requirements.Comment: 26 pages, 13 figure
Outcome-Guided Counterfactuals for Reinforcement Learning Agents from a Jointly Trained Generative Latent Space
We present a novel generative method for producing unseen and plausible
counterfactual examples for reinforcement learning (RL) agents based upon
outcome variables that characterize agent behavior. Our approach uses a
variational autoencoder to train a latent space that jointly encodes
information about the observations and outcome variables pertaining to an
agent's behavior. Counterfactuals are generated using traversals in this latent
space, via gradient-driven updates as well as latent interpolations against
cases drawn from a pool of examples. These include updates to raise the
likelihood of generated examples, which improves the plausibility of generated
counterfactuals. From experiments in three RL environments, we show that these
methods produce counterfactuals that are more plausible and proximal to their
queries compared to purely outcome-driven or case-based baselines. Finally, we
show that a latent jointly trained to reconstruct both the input observations
and behavioral outcome variables produces higher-quality counterfactuals over
latents trained solely to reconstruct the observation inputs
System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games
As Artificial and Robotic Systems are increasingly deployed and relied upon
for real-world applications, it is important that they exhibit the ability to
continually learn and adapt in dynamically-changing environments, becoming
Lifelong Learning Machines. Continual/lifelong learning (LL) involves
minimizing catastrophic forgetting of old tasks while maximizing a model's
capability to learn new tasks. This paper addresses the challenging lifelong
reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in
L2RL and making L2RL useful for practical applications requires more than
developing individual L2RL algorithms; it requires making progress at the
systems-level, especially research into the non-trivial problem of how to
integrate multiple L2RL algorithms into a common framework. In this paper, we
introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF),
which standardizes L2RL systems and assimilates different continual learning
components (each addressing different aspects of the lifelong learning problem)
into a unified system. As an instantiation of L2RLCF, we develop a standard API
allowing easy integration of novel lifelong learning components. We describe a
case study that demonstrates how multiple independently-developed LL components
can be integrated into a single realized system. We also introduce an
evaluation environment in order to measure the effect of combining various
system components. Our evaluation environment employs different LL scenarios
(sequences of tasks) consisting of Starcraft-2 minigames and allows for the
fair, comprehensive, and quantitative comparison of different combinations of
components within a challenging common evaluation environment.Comment: The Second International Conference on AIML Systems, October 12--15,
2022, Bangalore, Indi
A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems
Despite the advancement of machine learning techniques in recent years,
state-of-the-art systems lack robustness to "real world" events, where the
input distributions and tasks encountered by the deployed systems will not be
limited to the original training context, and systems will instead need to
adapt to novel distributions and tasks while deployed. This critical gap may be
addressed through the development of "Lifelong Learning" systems that are
capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3)
Scalability. Unfortunately, efforts to improve these capabilities are typically
treated as distinct areas of research that are assessed independently, without
regard to the impact of each separate capability on other aspects of the
system. We instead propose a holistic approach, using a suite of metrics and an
evaluation framework to assess Lifelong Learning in a principled way that is
agnostic to specific domains or system techniques. Through five case studies,
we show that this suite of metrics can inform the development of varied and
complex Lifelong Learning systems. We highlight how the proposed suite of
metrics quantifies performance trade-offs present during Lifelong Learning
system development - both the widely discussed Stability-Plasticity dilemma and
the newly proposed relationship between Sample Efficient and Robust Learning.
Further, we make recommendations for the formulation and use of metrics to
guide the continuing development of Lifelong Learning systems and assess their
progress in the future.Comment: To appear in Neural Network
Adversarial Policy Switching with Application to RTS Games
Complex games such as RTS games are naturally formalized as Markov games. Given a Markov game, it is often possible to hand-code or learn a set of policies that capture the diversity of possible strategies. It is also often possible to hand-code or learn an abstract simulator of the game that can estimate the outcome of playing two strategies against one another from any state. We consider how to use such policy sets and simulators to make decisions in large Markov games. Prior work has considered the problem using an approach we call minimax policy switching. At each decision epoch, all policy pairs are simulated against each other from the current state, and the minimax policy is chosen and used to select actions until the next decision epoch. While intuitively appealing, we show that this switching policy can have arbitrarily poor worst case performance. In response, we describe a modified algorithm, monotone policy switching, whose worst case performance, under certain conditions, is provably no worse than the minimax fixed policy in the set. We evaluate these switching policies in both a simulated RTS game and the real game Wargus. The results show the effectiveness of policy switching when the simulator is accurate, and also highlight challenges in the face of inaccurate simulations